Learning to Classify Organic and Conventional Wheat – A Machine Learning Driven Approach Using the MeltDB 2.0 Metabolomics Analysis Platform

نویسندگان

  • Nikolas Kessler
  • Anja Bonte
  • Stefan P. Albaum
  • Paul Mäder
  • Monika Messmer
  • Alexander Goesmann
  • Karsten Niehaus
  • Georg Langenkämper
  • Tim W. Nattkemper
چکیده

We present results of our machine learning approach to the problem of classifying GC-MS data originating from wheat grains of different farming systems. The aim is to investigate the potential of learning algorithms to classify GC-MS data to be either from conventionally grown or from organically grown samples and considering different cultivars. The motivation of our work is rather obvious nowadays: increased demand for organic food in post-industrialized societies and the necessity to prove organic food authenticity. The background of our data set is given by up to 11 wheat cultivars that have been cultivated in both farming systems, organic and conventional, throughout 3 years. More than 300 GC-MS measurements were recorded and subsequently processed and analyzed in the MeltDB 2.0 metabolomics analysis platform, being briefly outlined in this paper. We further describe how unsupervised (t-SNE, PCA) and supervised (SVM) methods can be applied for sample visualization and classification. Our results clearly show that years have most and wheat cultivars have second-most influence on the metabolic composition of a sample. We can also show that for a given year and cultivar, organic and conventional cultivation can be distinguished by machine-learning algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...

متن کامل

Forward kinematic analysis of planar parallel robots using a neural network-based approach optimized by machine learning

The forward kinematic problem of parallel robots is always considered as a challenge in the field of parallel robots due to the obtained nonlinear system of equations. In this paper, the forward kinematic problem of planar parallel robots in their workspace is investigated using a neural network based approach. In order to increase the accuracy of this method, the workspace of the parallel robo...

متن کامل

Debt Collection Industry: Machine Learning Approach

Businesses are increasingly interested in how big data, artificial intelligence, machine learning, and predictive analytics can be used to increase revenue, lower costs, and improve their business processes. In this paper, we describe how we have developed a data-driven machine learning method to optimize the collection process for a debt collection agency. Precisely speaking, we create a frame...

متن کامل

Drug Discovery Acceleration Using Digital Microfluidic Biochip Architecture and Computer-aided-design Flow

A Digital Microfluidic Biochip (DMFB) offers a promising platform for medical diagnostics, DNA sequencing, Polymerase Chain Reaction (PCR), and drug discovery and development. Conventional Drug discovery procedures require timely and costly manned experiments with a high degree of human errors with no guarantee of success. On the other hand, DMFB can be a great solution for miniaturization, int...

متن کامل

A Hybrid Machine Learning Method for Intrusion Detection

Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2015